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Open 



A wide array of microorganisms survive and thrive in extreme environments. However, we know little 
about the patterns of, and controls over, their large-scale ecological distribution. To this end, we 
have applied a bar-coded 16S rRNA pyrosequencing technology to explore the phylogenetic 
differentiation among 59 microbial communities from physically and geochemically diverse acid 
mine drainage (AMD) sites across Southeast China, revealing for the first time environmental 
variation as the major factor explaining community differences in these harsh environments. 
Our data showed that overall microbial diversity estimates, including phylogenetic diversity, 
phylotype richness and pairwise UniFrac distance, were largely correlated with pH conditions. 
Furthermore, multivariate regression tree analysis also identified solution pH as a strong predictor 
of relative lineage abundance. Betaproteobacteria, mostly affiliated with the 'Ferrovum' genus, were 
explicitly predominant in assemblages under moderate pH conditions, whereas Alphaproteobac- 
teria, Euryarchaeota, Gammaproteobacteria and Nitrospira exhibited a strong adaptation to more 
acidic environments. Strikingly, such pH-dependent patterns could also be observed in a 
subsequent comprehensive analysis of the environmental distribution of acidophilic microorgan- 
isms based on 16S rRNA gene sequences previously retrieved from globally distributed AMD and 
associated environments, regardless of the long-distance isolation and the distinct substrate types. 
Collectively, our results suggest that microbial diversity patterns are better predicted by 
contemporary environmental variation rather than geographical distance in extreme AMD systems. 
The ISME Journal (20^3) 7, 1038-1050; doi:10.1038/ismej.2012.139; published online 22 November 2012 
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Introduction 

Microbial biogeography is increasingly becoming 
an exciting topic in microbial ecology and a growing 
number of researches are addressing the spatial 
scaling and distribution pattern of microorganisms 
in the environment (Martiny et al., 2006; Green et al., 
2008). Despite their tremendous potential for global 
dispersal, there is accumulating evidence that free- 
living microorganisms exhibit nonrandom distribu- 
tion patterns across diverse habitats at various spatial 
scales. Niche-based processes have been implied as 
the primary drivers for the widely observed environ- 
ment-dependent diversity patterns and environmen- 
tal variables such as salinity (Lozupone and Knight, 
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2007; Auguet et al, 2010), pH (Fierer and Jackson, 
2006; Lauber et al, 2009; Rousk et al, 2010; Griffiths 
et al, 2011) and C:N ratio (Bates et al, 2011) 
identified as the major determinants of microbial 
community composition. However, there is also 
evidence that spatial distance, which may be seen 
as a proxy variable that represents differential com- 
munity dynamics related to the past historical events 
and disturbances (Ramette and Tiedje, 2007), have a 
role in structuring natural microbial assemblages (Cho 
and Tiedje, 2000; McAUister et al, 2011; Martiny 
et al, 2011). These studies of biogeography have 
provided initial insights into the processes that 
generate diversity patterns and improved our under- 
standing of why organisms live where they do and 
how they will respond to environmental change. 
However, systematically exploring the microbial 
geographical patterns by considering both contem- 
porary environmental variations and spatial distance 
simultaneously is limited (Ramette and Tiedje, 2007; 
Ge et al, 2008), and the relative importance of these 
factors in shaping microbial communities in natural 
environments remains largely unsolved. 
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The Earth's extreme environments harbor a wide 
array of extraordinary microorganisms that are, in 
some ways or others, similar to the ancient life forms 
(Amaral-Zettler et al., 2011). Analyzing the dynamic 
changes of these microbial communities coupled 
with physical and geochemical factors will reveal 
how microbes can adapt and tolerate to different 
kinds of environmental extremes and increase our 
understanding of microbial ecology and evolution. 
Acid mine drainage (AMD) is a widespread envir- 
onmental problem primarily resulting from the 
oxidative dissolution of pyrite (FeSs) and other 
sulfide minerals exposed to oxygen and water 
during metal ore mining (Nordstrom and Alpers, 
1999). Although typically low in overall microbial 
diversity, these unique environments harbor meta- 
bolically active, acidophilic microorganisms that are 
well adapted to the multiple environmental stresses 
encountered and are mainly responsible for the 
generation of these hot, sulfuric acid- and toxic 
metals-rich solutions (Baker and Banfield, 2003). 
While Acidithiobacillus ferrooxidans and Leptospir- 
illum ferrooxidans (the two iron-oxidizing species 
most commonly isolated from acidic drainage 
waters) are widely implicated to be the microorgan- 
isms that control the rate of AMD generation, more 
recent molecular-based investigations have revealed 
that other less known organisms (for example, 
Ferroplasma spp. in the Archaea and Lepto spirillum 
group III within the Nitrospira) are dominant 
in certain specific mine environments and they 
probably have important roles in the pyrite dissolu- 
tion in situ (Bond et al, 2000; Tan et al, 2007; 
Huang et al., 2011). Because of their biological and 
geochemical simplicity, AMD environments have 
the potential as model systems for quantitative 
analysis of microbial ecology and evolution and 
community function (Baker and Banfield, 2003; 
Denef et al, 2010). The first 16S rRNA gene-based 
microbial diversity surveys of AMD systems date 
back to the mid-1990s (Goebel and Stackebrandt, 
1994). Further molecular diversity inventories of 
AMD microbes have been conducted in a number of 
acidic environments in diverse geographical loca- 
tions, including Iron Mountain in California, USA 
(Bond et al., 2000; Druschel et al., 2004) and the Rio 
Tinto (RT) in southwestern Spain (Gonzalez-Toril 
et al., 2003; Garcia-Moyano et al, 2007). Although 
expanding our knowledge of the biodiversity 
of extremely acidic systems, these studies have 
typically examined a limited number of samples 
from a single mining environment, and the sequen- 
cing depth provided by a standard clone library 
analysis is relatively limited. Consequently, a global 
understanding of the pattern of AMD microbial 
diversity has not been available, and it is not clear 
how communities are shaped by the prevailing 
geochemical factors in these extreme environments 
and whether the major environmental determinants 
of microbial community composition differ from 
those working in 'normal' environments. The advent 
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of high-throughput pyrosequencing technology 
now affords new opportunities to address these 
knowledge gaps by comprehensively characterizing 
microbial communities in large numbers of ecologi- 
cal samples to examine broad trends of microbial 
distribution in AMD environments. 

Here, we applied a massively parallel tag pyro- 
sequencing of the V4 region of the 16S rRNA gene 
to examine in-depth microbial communities from 
diverse AMD sites across Southeast China to gain 
insight into the ecological characteristics of these 
extraordinary microorganisms. We wanted to deter- 
mine whether AMD microbes exhibit specific 
biogeographic patterns and which abiotic factors 
(contemporary environmental factors versus 
spatial distance) are more important in relating their 
diversity and composition across a broad range 
of physical and geochemical gradients. A meta- 
analysis based on previous molecular inventory 
studies of AMD environments from diverse geogra- 
phical locations was subsequently conducted to 
determine if the patterns observed in our pyrose- 
quence data set are applicable at broader (global) 
scales. 



Materials and methods 

Sample collection, physic ochemical analyses and DNA 
extraction 

A total of 59 AMD samples were collected from 
14 mining areas (12 active and two abandoned) 
across Southeast China (19.24°-31.64°N, 105.71°- 
118.62°E; Figure 1 and Supplementary Table 1) with 
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Figure 1 Location of sampling sites of AMD across Southeast 
China. Detailed site characteristics are listed in Supplementary 
Tables 1 and 2. 
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different mineralogy (for example, copper, lead- 
zinc, pyrite and polymetallic) and representing a 
broad variety of environmental conditions. Site 
locations were recorded by global positioning 
system and the geographical distances between 
sampling sites ranged from about 10 m to over 
1600 km. Water samples were taken from acidic 
streams, runoff ponds and AMD collection ponds 
(for storage before treatment) using sterile serum 
bottles and immediately kept on ice for transport to 
the laboratory. For DNA extraction, a 500 ml aliquot 
of each sample was coarse filtered through a 3 |im 
fiber filter (Type A/D Glass; Pall Corporation, Port 
Washington, NY, USA) and then filtered through a 
0.22 |im polyethersulfone membrane filter (Supor- 
200; Pall) using a peristaltic pump. The cell pellets 
on the polyethersulfone membranes were stored at 
— 40 °C before nucleic acid extraction, and the 
filtrates were temporarily stored at 4 °C for the chemical 
analyses within 10 days. Temperature, solution pH, 
dissolved oxygen (DO) and electrical conductivity 
(EC) were measured on-site by use of specific 
electrodes. Ferric and ferrous irons were measured 
by ultraviolet colorimetric assay with 1,10-phenan- 
throline at 530 nm (Hill et aL, 1978). Total organic 
carbon (TOC) was measured by high-temperature 
catalytic oxidation and infrared detection with a 
TOC analyzer (TOC-Vcsh; Shimadzu, Kyoto, Japan) 
and sulfate determined by a BaS04-based turbidi- 
metric method (Chesnin and Yien, 1951). The element 
analysis was performed by inductively coupled 
optical emission spectrometry (Optima 2100DV; Per- 
kin-Elmer, Waltham, MA, USA) after the filtrates were 
digested at 180 °C with cone. HNO3 and HCl (1:3, 
vv"^). Genomic DNA was extracted from the filters by 
following the protocol described by Frias-Lopez et al. 
(2008). As an additional step to facilitate cell lysis, the 
membranes were placed into the bead tubes and 
homogenized by shaking with a Fast Prep-24 Homo- 
genization System equipped with QuickPrep Adapter 
(MP Biomedicals, Seven Hills, NSW, Australia) for 
40 s at maximum speed. 



Amplification and bar-coded pyrosequencing of 
bacterial and archaeal 16S rRNA genes 
PCR amplification, purification, pooling and pyro- 
sequencing of a region of the 16S rRNA gene 
were performed following the procedure described 
by Fierer et al. (2008). We used the primer set 
F515 (5'-GTGCCAGCMGCCGCGGTAA-30 and R806 
(5'-GGACTACVSGGGTATCTAAT-30 that was desig- 
ned to amplify the V4 hypervariable region and 
demonstrated in silico to be universal for nearly all 
bacterial and archaeal taxa (Bates et aL, 2011). This 
short targeted gene region (^300 bp) can provide 
sufficient resolution for the accurate taxonomic 
classification of microbial sequences (Liu et aL, 

2007) . An 8-bp error-correcting tag (Hamady et aL, 

2008) was added to the forward primer. Samples 
were amplified in triplicate following the thermal 



cycling described previously (Fierer et aL, 2008). 
Replicate PCR reactions for each sample were 
pooled and purified using a QIAquick Gel Extrac- 
tion Kit (Qiagen, Chats worth, CA, USA). A single 
composite sample for pyrosequencing was prepared 
by combining approximately equimolar amounts of 
PCR products from each sample. Sequencing was 
carried out on a 454 GS FLX Titanium pyrosequen- 
cer (Roche 454 Life Sciences, Branford, CT, USA) 
at Macrogen (Seoul, Korea). 



Processing of pyrosequencing data 
Raw data generated from the 454-pyrosequencing 
run were processed and analyzed following the 
pipelines of Mothur (Schloss et aL, 2009) and QIIME 
(Caporaso et aL, 2010). Pyrosequences were 
denoized using the commands of 'shhh. flows' 
(translation of PyroNoise algorithm; Quince et aL, 

2009) and 'pre. cluster' (Huse et aL, 2010) in Mothur 
platform. Chimeric sequences were identified and 
removed using UCHIME with de novo method 
(Edgar et aL, 2011). Quality sequences were subse- 
quently assigned to samples according to their 
unique 8-bp barcode and binned into phylotypes 
using average clustering algorithm (Huse et aL, 

2010) at the 97% similarity level. Representative 
sequences were aligned using NAST (DeSantis et aL, 
2006) and then used to build the neighbor-joining 
phylogenetic trees using FastTree (Price et aL, 2009). 
Taxonomic classification of phylotypes was deter- 
mined based on the Ribosomal Database Project at 
the 80% threshold (Wang et aL, 2007). 

We estimated the relative abundance (%) of 
individual taxa within each community by compar- 
ing the number of sequences assigned to a specific 
taxon versus the number of total sequences obtained 
for that sample. We also calculated the number 
of phylotypes (richness) and the Faith's index of 
phylogenetic diversity (Faith's PD, sums of the total 
branch length in a phylogenetic tree that leads to 
each member of a community) to compare the 
community diversity across all 59 AMD samples. 
Weighted UniFrac analyses (Lozupone and Knight, 
2005; Lozupone et aL, 2006) were applied to 
calculate the pairwise distance between microbial 
assemblages. Calculations of diversity indices and 
UniFrac dissimilarity were based on a randomly 
selected subset of 540 sequences per sample. 
Normalizing the number of sequences per sample 
allowed us to control the effects of survey effort at 
same level in comparing the diversity indices and 
lineage-specific UniFrac distances across the sam- 
ples (Lauber et aL, 2009). 



Data collection and beta diversity of microbial 
communities from global AMD and associated 
environments 

To reveal broader patterns in the distribution of 
microorganisms among acidic environments 
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globally distributed, we searched papers in Web of 
Science and reviewed molecular inventory studies 
that explored microbial communities in natural 
AMD and associated environments (such as acidic 
biofilm, sediment and tailings) from diverse geogra- 
phical locations. 16S rRNA clone sequences were 
identified and recovered from GenBank for samples 
with detailed information of operational taxonomic 
units and their relative abundance, and environ- 
mental parameters were extracted and summarized. 
Community composition and weighted UniFrac 
dissimilarity were calculated for the subsequent 
meta-analysis (see detailed methods in Supplemen- 
tary Information). 

Statistical analyses 

All statistical analyses were implemented using 
various packages within the R statistical computing 
environment. Aggregated boosted tree analysis 
(ABT) (De'ath, 2007) was carried out using the 
gbmplus package (with 5000 trees used for the 
boosting, 10-folds cross-validation and three-way 
interactions) to evaluate quantitatively the relative 
influence of environmental variables to the commu- 
nity diversity. A sum of squares multivariate regres- 
sion tree (MRT) (De'ath, 2002) was performed using 
the mvpart package (with default parameters) 
to relate relative abundance of lineages to the site 
characteristics. Multiple linear regression (MLR) 
with stepwise method and Mantel test were con- 
ducted within the vegan package (Oksanen et al., 
2010) to test the significance between diversity 
indices and the site properties. For data sets of 
PD, phylotypes richness and relative abundance of 
individual taxa in the subsequent MLR analyses, 
independent variables, including pH, EC, DO, TOC, 
SOI", Fe^ + , Fe^ + , latitude and longitude, were 
input into the MLR model, while for the weighted 
UniFrac dissimilarity data set Bray-Curtis dissim- 
ilarities of these environmental variables and geo- 
graphical distance were used. ABT and MRT 
analyses are statistical techniques that fundamen- 
tally aim to perform accurate prediction and 
explanation between complex ecological data (as 
we used diversity indices and dissimilarity matrices 
in ABT and relative lineage abundance in MRT) and 
environmental characteristics (De'ath, 2002, 2007). 
More importantly, the application of these methods 
allowed us to quantify and visualize the different 
contribution of environmental variables and geogra- 
phical distance to the community diversity. 

Results 

Site characteristics and environmental conditions 
The AMD samples captured a wide range of physical 
and geochemical gradients (Supplementary Tables 1 
and 2) and were characterized by extremely acidic 
pH values ranging from 1.9 to 4.1 (2.6 + 0.45, 
mean + s.d.) and high concentrations of dissolved 
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solids (measured as EC) ranging from 134 to 20 000 
(4528 ± 3786) |iS cm"^. Concentrations of sulfate and 
ferric and ferrous irons were highly variable across 
the samples, averaging at 3787 + 2129, 1317 + 3915 
and 130 + 354mgl"\ respectively. Additionally, DO 
(3.3 + 3.6mgl-') and TOC (8.9 + 12.4mgl-') were 
also subjected to considerable fluctuations. 



Composition and diversity of AMD microbial 
communities 

The bar-coded pyrosequencing generated 131 720 
quality sequences from the 59 AMD samples, with 
an average of 2234 + 1756 sequences and a range of 
542 to 9263 sequences per sample. All but 436 of the 
131 720 sequences could be classified at the domain 
level (Bacteria or Archaea) by the RDP classifier 
(80% threshold). A total of 2198 phylotypes were 
defined at the 97% similarity level, with the 
majority (^54% of the phylotypes) represented by 
a single sequence, whereas all of these singletons 
could be assigned to the taxa that were identified in 
the whole pyrosequence data set. The number 
of phylotypes detected in each sample ranged from 
10 to 244, with an average of 61 + 49 according to a 
subset of 540 randomly selected sequences. Of the 
classifiable sequences, 18 phyla were identified, 
with Proteobacteria, Nitrospira and Euryarchaeota 
representing the most dominant lineages and 
accounting for 72%, 12% and 5.1% of all sequences, 
respectively. Some other phyla were less abundant 
but still detected in most of the samples; these 
included Firmicutes (3.4%), Actinobacteria (1.1%) 
and Acidobacteria (1.1%). Down to the genus level, 
the most abundant phylotypes were affiliated with 
the 'Ferrovum' (59 333 sequences), Acidithiobacillus 
(13 744 sequences), Acidiphilium (9461 sequences) 
and Lepto spirillum (7756 sequences); these collec- 
tively accounted for 69% of the total sequences. 
Specifically, the Acidithiobacillus sequences were 
composed predominantly (>95%) of A. ferrooxi- 
dai7s-like organisms, with an approximately 4.3% of 
A. caldus. The largest portion of the Leptospirillum 
reads was affiliated with Leptospirillum ferrodiazo- 
trophum (60%), with the remaining being phylo- 
genetically affiliated with L. ferrooxidans (>39%) 
and L. ferriphilum (0.34%). Additionally, almost 
all of the Acidiphilium sequences (>98%) were 
affiliated with Acidiphilium cryptum. The relative 
abundance of different lineages varied considerably 
across the AMD communities (Figures 2 and 3 and 
Supplementary Table 3a, also see the variances of 
the measured averages of individual lineages com- 
prising the defined pH levels in Supplementary 
Tables 3b and 4). 



Relative influence of environmental conditions on 
microbial diversity 

ABT models were conducted to interpret the relative 
importance of environmental conditions and spatial 
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Figure 2 Relative abundances (%) of dominant lineages (phylum level) in overall communities and in different groups of AMD samples 
along the gradient of pH levels. The numbers above the columns indicate the number of samples in each group. Others include 12 phyla: 
Bacteroidetes, Chlamydiae, Chloroflexi, Crenarchaeota, Cyanobacteria, Deinococcus-Thermus, Gemmatimonadetes, ODl, OPll, 
Planctomycetes, TM7 and Verrucomicrobia; and two subphyla for Proteobacteria: Deltaproteobacteria and Epsilonproteobacteria. 
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Figure 3 Relative abundances of Ferrovum spp., Leptospirillum 
groups and A. ferrooxidans in different groups of microbial 
assemblages along the gradient of pH levels in AMD. The 
numbers within the parentheses indicate the number of samples 
in each group. 

isolation to the diversity patterns of AMD microbial 
communities in Southeast China. ABT analysis 
indicated that pH was the major factor for the 
patterns of both PD and phylotypes, accounting for 
approximately 23% and 21% of the relative influ- 
ence, respectively (Figures 4a and b). Partial depen- 
dency plots of pH from the fitted model revealed 
that high values of diversity index were most likely 
to be observed at higher pH conditions 
(Supplementary Figures la and b). These results 
were in good agreement with the significantly 
positive correlations between solution pH and over- 
all diversity determined by MLR analysis (Faith's 
PD: r= 0.349, P= 0.008; Phylotypes: r= 0.359, 
P= 0.006; for both PD and Phylotypes, environmen- 
tal variables other than pH were all eliminated in the 
predicted MLR model, see Statistical analyses). 
Additionally, the ABT analysis also revealed a 
moderate effect for EC and weaker effects for other 



environmental variables such as TOC and DO on 
both diversity estimates. In comparison, the spatial 
isolation represented by the gradients of latitude 
and longitude made less contributions, indicating 
that there is no obvious effect of geographical 
distance on the AMD microbial diversity. When 
considering the pairwise community distances 
between microbial assemblages, the variation of pH 
calculated by Bray-Curtis distance again revealed 
the most important influence on the weighted 
UniFrac dissimilarity (Figure 4c). This relationship 
was corroborated by the Mantel test (Spearman's 
r= 0.329, P< 0.001) and MLR analysis (r= 0.337, 
P< 0.001, with Bray-Curtis dissimilarity of pH as 
the only variable remained in the MLR model), with 
higher divergence of solution pH likely leading to 
higher UniFrac dissimilarity (Supplementary 
Figure Ic). In contrast, no significant correlation 
between geographical distance and the UniFrac 
distance was detected by the Mantel test (Spear- 
man's r= 0.106, P= 0.072), implying that the 
contribution of spatial isolation to the community 
dissimilarity was limited (Figure 4c). 



Relationship between relative abundance of dominant 
lineages and environmental conditions 
Taxonomy-supervised analysis has the advantages 
of less computation requirement and more tolerance 
of sequencing errors (Sul et ah, 2011). We further 
conducted an MRT analysis using our AMD field 
data, which interpreted the relationship between the 
relative abundance of dominant lineages and envir- 
onmental conditions by providing a tree with seven 
terminal nodes based on pH, TOC and concentra- 
tions of sulfate, ferric and ferrous irons, collectively 
explaining 70% of the standardized abundance 
variance (Figure 5). The results suggested that 
spatial isolation represented by sampling location 
(measuring longitude and latitude as two variables 
in the MRT model) was less of a factor than 
environmental variables in explaining the variation 
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Figure 4 Relative influence (%) of environmental properties and spatial distance for phylogenetic diversity (a), phylotypes (b), 
weighted UniFrac dissimilarity of field data (pyrosequecing) (c) and weighted UniFrac dissimilarity of metadata (meta-analysis) 
(d) evaluated by ABT models. 
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Figure 5 Multivariate regression tree analysis of the relation between relative abundance of dominant lineages and environmental 
parameters in microbial communities of AMD. The bar plots show the mean relative abundance of specific lineages at each terminal 
nodes and the distribution patterns of relative abundance represent the dynamics of community composition among each split. 
The numbers under the bar plots indicate the number [n] of samples within each group. All values are in mgl^, except pH, which is in 
standard units. 



in microbial community composition, and pH 
appeared to be a strong predictor of relative lineage 
abundance with samples with low pH levels (pH 



<2.3) clustering separately from those with moder- 
ate pH values. Betaproteobacteria, the most abun- 
dant lineage (44 ±34%) across all communities, 
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showed an intense response to solution pH, with 
low relative abundance (4.2 + 9.1%, 77 = 15) in 
extremely acidic environments, but explicitly pre- 
dominant (48 ± 32%, n = 44) in the microbial assem- 
blages under moderate pH conditions. Such a trend 
was largely attributed to the distribution pattern of 
'FerroFum '-related organisms (Figure 3). In contrast, 
other lineages such as Alphaproteobacteria, 
Euryarchaeota, Gammaproteobacteria and Nitros- 
pira exhibited a distinct adaptation to more acidic 
environments with an increase of relative abun- 
dance. These results were coincident with the 
overall dynamics of community composition along 
the pH gradient (Figure 2). Similar patterns were 
found for other environmental variables in the MRT 
analysis, with groups uniquely dominated by Beta- 
proteobacteria generally separating from those with 
notable increase of relative abundance of other 
lineages (Figure 5), implying that the optimal 
conditions for the growth of Betaproteobacteria 
(mainly 'Ferrovum' spp.) were apparently different 
from those for the other lineages. 



Global distribution patterns of microbial diversity 
in AMD and associated environments 
From the 25 molecular inventory studies that met 
our literature searching criteria, 66 samples 
with overall community composition were extracted 
for the taxon-based analysis in MRT model, and 
45 of them (which had detailed information of OTUs 
and their relative abundance) were retained for the 
phylogenetic-based analysis in ABT model (Supple- 
mentary Tables 5a and b). It should be noted, 
however, that some parameters were missing from 
individual studies, but this problem could be over- 
come in the ABT and MRT model analyses as these 
models can deal with different types of response 
variables (for example, numeric or categorical) with 
missing values (De'ath, 2002, 2007). 

Overall, similar patterns of microbial composition 
were found in the global AMD and associated 
systems, with Proteobacteria, Nitrospira and Eur- 
yarchaeota as the major groups despite considerable 
fluctuations in their relative recovery in the 16S 
rRNA gene libraries (Supplementary Table 6). Most 
strikingly, pairwise UniFrac distances between the 
45 samples were still largely affected by environ- 
mental pH as revealed by the ABT model 
(Figure 4d), implying that microbial assemblages 
from different substrates may have similar commu- 
nity composition under a similar pH condition. 
Furthermore, the geographical distance between the 
globally distributed samples (up to 18 000 km) still 
had less influence to the community dissimilarity 
than pH, supporting environmental variation as the 
major factor relating microbial communities as 
observed in our pyrosequence data set. Likewise, 
the MRT analysis using the metadata of 66 samples 
indicated that microbial community composition 
was mainly shaped by pH level, with relatively little 



influence from spatial isolation (Figure 6). However, 
in comparison with the significant contribution 
of Betaproteobacteria to the overall microbial 
distribution in the Chinese AMD environments, 
the global-scale pH-dependent pattern was largely 
attributed by the predominant distribution of 
Euryarchaeota and Nitrospira under relatively low 
pH conditions (pH <1.9). 

Discussion 

A comprehensive survey of AMD microbial diversity 
We characterized the diversity and composition 
of microbial communities from diverse and geogra- 
phically separated acidic mining environments in 
Southeast China. The large number of samples 
surveyed and the sequencing depth provided by 
the bar-coded pyrosequencing generated an unpre- 
cedented number of AMD microbial 16S rRNA gene 
sequence data that far exceed the total number of 
sequences reported in previous clone library stu- 
dies, significantly expanding our knowledge of the 
broad trends of microbial distribution in extremely 
acidic environments. Although most of the AMD 
communities have been sufficiently sampled by the 
pyrosequencing (as suggested by the rarefaction 
analyses; Supplementary Figure 2), the full extent 
of microbial diversity in a few samples has not been 
captured. It is not likely that this is due to inflation 
of biodiversity estimate by sequencing errors gener- 
ated by noise introduced during pyrosequencing 
and the PCR amplification stage (Reeder and Knight, 
2010), as such bias should have been limited after 
our stringent denoizing of data. Similar results have 
been reported in other extreme habitats such as 
hydrothermal chimneys (Brazelton et al., 2010) and 
acidic hot spring (Bohorquez et al., 2012), where 
numerous rare taxa account for most of the observed 
diversity, implying that microbial diversity could be 
higher than expected in some specific sites with 
complex interactions among environmental vari- 
ables and microorganisms. Interestingly, although 
the microbial diversity (Faith's PD and Phylotypes) 
generally increased along the solution pH gradient, a 
moderately higher diversity and relatively uniform 
distribution pattern were found in the lowest pH 
level (pH <2.0; Figure 2 and Supplementary 
Table 7). This may be related to the significantly 
higher organic carbon contents in a few samples in 
this pH group, as high carbon contents with 
heterogeneous resource condition have previously 
been found to promote high species diversity in 
soil (Zhou et al., 2002) and marine sediments 
(Stach et al., 2003). 

Better prediction of microbial diversity patterns by 
solution pH 

Previous molecular investigations have documented 
spatial and seasonal variations in microbial 
populations in specific AMD environments, and 
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Figure 6 Multivariate regression tree analysis of the relation between relative abundance of dominant lineages and environmental 
parameters in microbial communities of global AMD and associated systems using metadata. pH, latitude and longitude are in standard 
units. Temperature is in °C and sulfate concentration is in mgl^. 



different major environmental determinants such as 
conductivity and rainfall (Edwards et al., 1999), 
solution pH (Lear et al., 2009) and oxygen gradient 
(Gonzalez-Toril et al., 2011) have been identified 
across different studies at local scale, which may be 
due to the site-specific geochemical characteristics. 
Other factors, in particular resolutions of the 
different methods used to evaluate community 
composition (fluorescent in situ hybridization, 
community fingerprinting, and so on) and the 
relatively small number of samples examined, 
should also be taken into account. More recent 
studies have used extensive genomic sequencing 
(Denef and Banfield, 2012) and proteomics (Mueller 
et al., 2010) to investigate how rapid adaptive 
evolution may have assisted in the maintenance of 
the dominant populations in local AMD commu- 
nities and how the physiologies of the dominant and 
less abundant organisms change along environmen- 
tal gradients corresponding with their ecological 
distribution within the Richmond Mine at Iron 
Mountain in California. Our pyrosequencing survey 
of microbial diversity in multiple geographically 
separated AMD sites across Southeast China has 
provided an initial insight into the large-scale 
distribution patterns of microbes in these unique 
habitats, with greater variability in physical 
and geochemical attributes. Our phylogenetic- and 
taxon-based analyses (ABT and MRT models) 
supported the important idea that similar ecological 
conclusions could be drawn by using these two 
commonly used clustering approaches (Sul et al.. 



2011), and consistently suggested that, notably, 
pH is still the definitive factor structuring AMD 
communities, regardless of the fact that low pH is a 
major feature of these extreme environments and 
the indigenous microorganisms are predominantly 
acidophiles that are well adapted and tolerant 
to these prevailing extreme conditions. More strik- 
ingly, our subsequent meta-analysis of previous 
molecular surveys of AMD systems in diverse 
geographical locations also revealed similar pH- 
dependent patterns in microbial diversity at a global 
scale despite the long-distance separation and 
regardless of distinct substrate types. These consis- 
tent results were presumably due to the strong 
selective pressures with extremely acidic conditions 
that primarily determine which lineages can survive 
there. Indeed, optimum pH for growth can vary 
significantly among cultivated acidophilic species 
or even between phylogenetically highly similar 
(as evidenced by 16S rRNA sequence comparison) 
microorganisms isolated from different acidic 
mining environments (Edwards et al., 2000; 
Golyshina et al., 2000). Furthermore, recent quanti- 
tative proteomic analyses of the response of 
acidophilic microbial communities to different 
pH conditions has suggested pH-specific niche 
partitioning of prokaryotes and confirmed the 
importance of pH and related geochemical factors 
in fine-tuning acidophilic microbial community 
structure and function (Belnap et al., 2011). 

Hot springs have also been targets for the 
study of microbial diversity and adaptation in 
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extreme environments. Although some studies 
have suggested that environmental temperature is 
a primary factor controlling the structure and 
dynamics of microbial communities (Ward et ah, 
1998; Miller et ah, 2009), significant differences in 
microbial community composition exist between 
the alkaline (Miller et aL, 2009) and acidic (Mathur 
et aL, 2007) hot springs in Yellowstone National 
Park (WY, USA). Furthermore, the acidic hot springs 
with high sulfate concentrations (Stout et aL, 2009) 
and iron-rich conditions (Mathur et aL, 2007) 
were dominated by iron-oxidizing acidophiles 
such as Acidithiobacillus spp., irrespective of the 
wide range of temperature gradients. This similar 
distribution and adaptation of such acidophiles in 
acidic hot springs and acid mine water may indicate 
that pH is a common determinant for structuring the 
microbial communities in the two different extreme 
environments. These results are potentially impor- 
tant since recent studies have repeatedly identified 
pH as one of the general selective pressures working 
in 'normal' environments like soil (Fierer and 
Jackson, 2006; Lauber et aL, 2009; Rousk et aL, 
2010; Griffiths et aL, 2011). 



Other potential factors explaining community 
differences 

Several recent meta-analyses have identified sali- 
nity as the primary environmental factor shaping the 
ecological distribution of prokaryotic taxa along 
broad environmental gradients and across different 
habitat types (Lozupone and Knight, 2007; Tamames 
et aL, 2010). It is striking that general environmental 
properties such as salinity still primarily determine 
the microbial composition of extreme environments, 
although these organisms are presumably under 
strong selective pressures (for example, temperate 
and pH). Although salinity (as reflected by the EC 
values) had a weaker effect than solution pH in our 
ABT models, a significantly negative relationship 
was detected between EC and pH (Spearman's 
r= -0.631, P< 0.001), and the combination of pH, 
EC and their interaction was the best MLR model to 
predict the microbial community diversity 
(Faith's PD r= 0.505, P= 0.006 and Phylotypes 
r= 0.478, P= 0.002, for MLR model considering 
the interaction between pH and EC), implying the 
potential importance of salinity for structuring AMD 
microbial communities, a feature documented in a 
previous fluorescent in situ hybridization-based 
survey of an acid-generating site at Iron Mountain 
in California (Edwards et aL, 1999). 

Dispersal limitation and past environmental con- 
ditions can lead to genetic divergence of microbial 
assemblages from geographically separated sites 
(Martiny et aL, 2006). Although our results sug- 
gested that the overall microbial diversity patterns 
were better predicted by contemporary environmen- 
tal variation, a moderate influence of geographical 
isolation on the UniFrac dissimilarity of the 



Acidithiobacillus and Leptospirillum genera could 
still be found (although no significant correlation 
was detected by the Mantel test), indicating that 
observed patterns could be taxa-/lineage-dependent 
or might vary with the levels of analytical/phyloge- 
netic resolution. It should also be noted that relative 
influence of historical versus environmental 
factors could be related to the scale of sampling 
(Martiny et aL, 2006), as exemplified by the inter- 
and intracontinental surveys of hot spring Synecho- 
coccus and Sulfolobus assemblages (Papke et aL, 
2003; Whitaker et aL, 2003). 



Taxonomic distribution pattern of dominant species 
in AMD microbial communities 

Taxonomic classification of our pyrosequences 
revealed that Betaproteobacteria were ubiquitous 
and dominant across the AMD sites, especially 
at moderate pH conditions (pH >2.4; Figures 2 
and 5). Significantly, the most abundant phylotype, 
which accounted for approximately 91% of the total 
Betaproteobacteria reads, was affiliated with the 
recently discovered genus 'Ferrovum' (Hallberg 
et aL, 2006). Although less known, 'Ferrovum' spp. 
are suggested to be obligately autotrophic and 
capable of growth only by ferrous iron oxidation 
with less acid tolerance than the well-studied AMD 
species L. ferrooxidans and A. ferrooxidans (Rowe 
and Johnson, 2008; Hallberg, 2010). Our pyrose- 
quencing survey of diverse AMD sites supports the 
acidic susceptibility of these 'FerroFum'-affiliated 
organisms and their preference for relatively high 
ferrous iron conditions (Figure 5) (Heinzel et aL, 
2009b). More importantly, their wide distribution 
and dominance were conspicuous across geochemi- 
cally distinct mining environments examined in this 
study (Figure 3) and other acidic sites (Brockmann 
et aL, 2010; Brown et aL, 2011; Kimura et aL, 2011), 
implying their important ecological role in the 
extreme AMD systems. Notably, although pH 
was identified as the primary force shaping their 
large-scale ecological range (across Southeast 
China), EC was found (r= -0.76, P= 0.018, MLR 
with all other independent variables excluded) to 
influence their local distribution at the YunFu mine 
(where a relatively large sample size is available for 
analyzing the local patterns), implying the potential 
effects of site-specific geochemical characteristics. 
The interactions between 'Ferrovum' spp. and other 
populations over a continuum of spatial and 
temporal scales within entire regions merit further 
study (Ricklefs, 2008). 

In contrast to the high relative abundance of 
'Ferrovum' spp. under moderate pH conditions, 
A. ferrooxidans and Leptospirillum groups (acido- 
philes widely implicated for their contribution to 
AMD production) were more dominant and thus 
likely to have significant roles in more acidic 
conditions (Figure 3). A similar distribution of 
Leptospirillum spp. and A. ferrooxidans was found 
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along the pH gradient, revealing an optimum pH 
range of 2.0-2.4 and a significant decrease in 
relative abundance when the pH increases to above 
3.0 (Figure 3). However, a relatively high Leptospir- 
illum spp. to A. ferrooxidans ratio was observed at 
higher Fe^ + /Fe^+ (redox potential) (Mest, for inde- 
pendent samples, P<0.01), indicating a potential 
competition for energy sources (ferrous iron) 
and supporting the greater Fe^+ affinity and less 
sensitivity to Fe^+ inhibition of Lepto spirillum spp. 
(Rawlings et al, 1999). 

The biogeographic patterns of less abundant 
AMD taxa and the associated driving forces remain 
unresolved. While heterotrophic bacteria such as 
Acidiphilium were widely detected across the AMD 
samples, they were generally presented in signifi- 
cantly smaller proportions than the acidophilic 
autotrophs and no clear statistical correlation was 
found between TOC and relative abundance of 
the detected heterotrophs, suggesting a limited 
contribution of external inputs of fixed carbon in 
the maintenance of the indigenous microbial assem- 
blages. Archaea accounted for a non-negligible 
proportion (average >5.0%) across the AMD com- 
munities; however, only a small fraction of these 
sequences could be confidently assigned at the 
genus level, mostly to the Ferroplasma, although 
recent studies have suggested that acidophilic 
Archaea related to the ferrous-iron-oxidizing F. 
acidiphilum are numerically significant and thus 
ecologically important in some acidic environments 
in diverse geographical locations (Golyshina and 
Timmis, 2005; Huang et al, 2011). In general, 
no obvious distribution patterns of Archaea or 
Ferroplasma spp. could be observed along either 
geographical distance or environmental properties 
in our data set (no independent variables retained in 
the MLR model). Additionally, the neutrophilic 
Gallionella spp., which have been commonly 
found in AMD habitats, were nearly not detected 
(two phylotypes with 33 reads) among our diverse 
AMD sites. Acid susceptibility may not be the main 
reason for their absence in the Chinese AMD 
environments as these iron oxidizers have been 
detected in a few other acidic sites with a low pH 
range of 2.6-3.0 (Hallberg et al, 2006; Heinzel et al, 
2009a; Kimura et al, 2011). 



Summary and prospect 

We report the most comprehensive analysis of the 
geographical distribution of AMD microbes to date, 
revealing environmentally dependent patterns at 
both regional and global scale with less contribu- 
tions of spatial separation. While pH is identified as 
the major factor relating microbial communities 
over the range of acidic habitats examined in this 
study and "normal" environments surveyed in other 
studies (Lauber et al, 2009; Griffiths et al, 2011), 
the underlying mechanisms have thus far remained 
unsolved. Trait-based or functional biogeography 
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assessed by metagenomics (Raes et al., 2011) or 
comprehensive functional gene arrays such as 
GeoChip (He et al., 2010) represents a promising 
strategy to address this issue. We could not 
distinguish in our pyrosequence data set active 
community members from the dormant taxa, which 
may be metabolically inactive because of the 
unfavorable environmental conditions and thus less 
sensitive to environmental change. Future investiga- 
tions specifically focusing on the active members of 
the community will likely reveal even more 
pronounced environmentally dependent patterns 
of microbial diversity. Such knowledge is critical 
as these active taxa presumably have crucial roles in 
the functioning of the AMD ecosystems. Addition- 
ally, evaluation of microbial population dynamics at 
fine temporal scales and over relatively long periods 
of time at a diverse array of AMD sites would bring 
novel insight and greater predictive power to the 
microbial diversity patterns in these extreme 
environments. 
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